Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve memory usage of zstd encoder by using our own pool management #2375

Merged
merged 1 commit into from
Nov 14, 2022

Conversation

rtreffer
Copy link
Contributor

Currently a single zstd encoder with default concurrency is used. Default concurrency causes EncodeAll to create one encoder state per GOMAXPROC - per default per core.

On high core machined (32+) and high compression levels (32MB / state) this leads to 1GB memory consumption per ~32 cores. A 1GB encoder is pretty expensive compared to the 1MB payloads usually sent to kafka.

The new approach limits the encoder to a single core but allows dynamic allocation of additional encoders if no encoder is available. Encoders are returned after use, thus allowing for reuse, with a limit of 1 spare encoder to limit memory overhead.

A benchmark emulating a 96 core system shows the memory effectiveness of the change.

Previous result:

goos: linux
goarch: amd64
pkg: github.com/Shopify/sarama
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkZstdMemoryConsumption-8               2         834830801 ns/op        3664055292 B/op     4710 allocs/op
PASS
ok      github.com/Shopify/sarama       2.181s

Current result:

goos: linux
goarch: amd64
pkg: github.com/Shopify/sarama
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkZstdMemoryConsumption-8   	       5	 222605954 ns/op	38960185 B/op	     814 allocs/op
PASS
ok  	github.com/Shopify/sarama	3.045s
BenchmarkZstdMemoryConsumption-8       2        834830801 ns/op        3664055292 B/op        4710 allocs/op
BenchmarkZstdMemoryConsumption-8       5        222605954 ns/op          38960185 B/op         814 allocs/op

A ~4x improvement on total runtime and a 96x improvemenet on memory usage for the first 2x96 messages.

This patch will as a downside increase how often new encoders are created on the fly and the maximum number of encoders might be even higher - however it should be in line with the actual used cores instead of the theoretical available cores.

Currently a single zstd encoder with default concurrency is used.
Default concurrency causes EncodeAll to create one encoder state per
GOMAXPROC, per default per core.

On high core machined (32+) and high compression levels this leads to
1GB memory consumption per ~32 cores. A 1GB encoder is pretty expensive
compared to the 1MB payloads usually sent to kafka.

The new approach limits the encoder to a single core but allows dynamic
allocation of additional encoders if no encoder is available. Encoders
are returned after use, thus allowing for reuse.

A benchmark emulating a 96 core system shows the effectiveness of the
change.

Previous result:
```
goos: linux
goarch: amd64
pkg: github.com/Shopify/sarama
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkZstdMemoryConsumption-8               2         834830801 ns/op        3664055292 B/op     4710 allocs/op
PASS
ok      github.com/Shopify/sarama       2.181s
```

Current result:
```
goos: linux
goarch: amd64
pkg: github.com/Shopify/sarama
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkZstdMemoryConsumption-8   	       5	 222605954 ns/op	38960185 B/op	     814 allocs/op
PASS
ok  	github.com/Shopify/sarama	3.045s
```

```
BenchmarkZstdMemoryConsumption-8       2        834830801 ns/op        3664055292 B/op        4710 allocs/op
BenchmarkZstdMemoryConsumption-8       5        222605954 ns/op          38960185 B/op         814 allocs/op
```
A ~4x improvement on total runtime and a 96x improvemenet on memory usage for the first 2x96 messages.

This patch will as a downside increase how often new encoders are
created on the fly and the maximum number of encoders might be even
higher.
@rtreffer
Copy link
Contributor Author

CLA signed

@dnwe dnwe changed the title Reduce the zstd encoder state to pool size to one feat: improve memory usage of zstd encoder by using our own pool management Nov 14, 2022
@dnwe dnwe added the feat label Nov 14, 2022
Copy link
Collaborator

@dnwe dnwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and sharing benchmarks — this change looks good to me

cc @lizthegrey

@lizthegrey
Copy link
Contributor

Thanks for working on this and sharing benchmarks — this change looks good to me

cc @lizthegrey

Ooh, thanks. This probably will let us better bin-pack our telemetry ingest processes with other more memory-intensive workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants